Combining RapidMiner operators with bioinformatics services – a powerful combination

نویسندگان

  • Simon Jupp
  • James Eales
  • Simon Fischer
  • Sebastian Land
  • Rishi Ramgolam
  • Alan Williams
  • Robert Stevens
چکیده

Knowledge discovery through pattern finding in data is central to modern molecular biology, which now has thousands of databases and similar numbers of tools for processing those data. Any data analysis in molecular biology involves gathering and processing data from many sources, even before the analysis for the central biological question takes place. Taverna is a workflow workbench that allows bioinformaticians to create data pipelines involving distributed Web services and other forms of tool; these workflows gather and manage data in order to perform analyses that answer biological questions. RapidMiner brings a large suite of data processing, visualisation and data mining tools to bear upon tables of data, but there is a disconnect between these operators and the services available to users of Taverna. Through a RapidMiner extension to Taverna we have combined the ability to gather and process data from many molecular biological sources with RapidMiner’s data mining capabilities to provide a powerful tool for scientific analysis. In this article we describe this RapidMiner extension to Taverna and some preliminary analyses we have performed using RapidMiner on biological data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Web services and data mining: combining linguistic tools for Polish with an analytical platform

In this paper we present a new combination of existing language tools for Polish with a popular data mining platform intended to help researchers from digital humanities perform computational analyses without any programming. The toolset includes RapidMiner Studio, a software solution offering graphical setup of integrated analytical processes and Multiservice, a Web service offering access to ...

متن کامل

Mining the Web of Linked Data with RapidMiner

Lots of data from different domains is published as Linked Open Data (LOD). While there are quite a few browsers for such data, as well as intelligent tools for particular purposes, a versatile tool for deriving additional knowledge by mining the Web of Linked Data is still missing. In this system paper, we introduce the RapidMiner Linked Open Data extension. The extension hooks into the powerf...

متن کامل

Feature Selection for high-dimensional data with RapidMiner

The number of recorded feature has grown exponentially over the last years. In the bioinformatics domain datasets with hundreds of thousands of features are no more unusual. Extracting knowledge from such huge heaps of data demands for new methods. Traditional methods that are applicable for low dimensional data like wrapper feature selection can no longer handle the growing number of features....

متن کامل

COMBINING FUZZY QUANTIFIERS AND NEAT OPERATORS FOR SOFT COMPUTING

This paper will introduce a new method to obtain the order weightsof the Ordered Weighted Averaging (OWA) operator. We will first show therelation between fuzzy quantifiers and neat OWA operators and then offer anew combination of them. Fuzzy quantifiers are applied for soft computingin modeling the optimism degree of the decision maker. In using neat operators,the ordering of the inputs is not...

متن کامل

"Semantics Inside!" But Let's Not Tell the Data Miners: Intelligent Support for Data Mining

Knowledge Discovery in Databases (KDD) has evolved significantly over the past years and reached a mature stage offering plenty of operators to solve complex data analysis tasks. User support for building data analysis workflows, however, has not progressed sufficiently: the large number of operators currently available in KDD systems and interactions between these operators complicates success...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011